Data visualization

Author

Oleksandr Koval

Mission

This document created to make easier understanding data that we are working.

Introduction

In this document we will observe some tables:

  • ds_year ( describes the number of titles and the number of copies for each year, since 2018)

  • ds_genre ( describes the number of titles and the number of copies for each year, by genre)

  • ds_georaphy ( describes the number of titles and the number of copies for each year, by place)

  • ds_language (describes the number of titles and the number of copies for each year, by language)

  • ds_pubtype ( describes the number of titles and the number of copies for each year, by publication type)

  • ds_ukr_rus (describes the number of titles and the number of copies, the percentage of Ukrainian and russian books for each year)

P.s: all table have information for 2005 - 2024, expect ds_language

Preparing

Load libraries that are used in this report

load libraries
library(dplyr)
library(ggplot2)
library(tidyr)
library(plotly)

Data import

We will be working with the data that describes the two main measures: the number of titles (title_count) and the number of copies (copy_count). We have specific tables that describe these measures under different conditions.

Load data
getwd()
ds_genre <- readRDS("../../data-private/derived/manipulation/ds_genre.rds") # load ds_genre

ds_year <- readRDS("../../data-private/derived/manipulation/ds_year.rds") # load ds_year

ds_language  <- readRDS("../../data-private/derived/manipulation/ds_language.rds") # load ds_language

ds_geography  <- readRDS("../../data-private/derived/manipulation/ds_geography.rds") # load ds_geography

ds_pubtype  <- readRDS("../../data-private/derived/manipulation/ds_pubtype.rds") # load ds_pubtype

ds_ukr_rus  <- readRDS("../../data-private/derived/manipulation/ds_ukr_rus.rds") # load ds_ukr_rus

What we going to do?

We want to do a short visualization for each table, to make easier understanding the data (you observe few graphs that describe the data)

Lets start !

DS_YEAR

This table has 40 rows, and 3 columns. Also, in the measure column we have only two possible values - title_count means the number of titles and copy_count means the number of copies. In the column value we have the numerical numbers for both measures.

The first graph observe the number of copies for each year.

Making graph
years_all <- seq(min(ds_year$yr), max(ds_year$yr))

g_year_copy <- ds_year %>% 
  filter(measure == "copy_count") %>% 
  ggplot(aes(x = yr, y = value)) +
  geom_point() +
  geom_line() +
  scale_x_continuous(breaks = years_all) +             
  scale_y_continuous(breaks = seq(0, 80000, by = 10000), limits = c(0, 80000)) +  
  labs(
    title = "Number of Copies by Year"
    , x = "The year"
    , y = "Number of copies (ths.)"
  )+
  theme_minimal() +
  theme(
    panel.grid.minor = element_blank()
    ,panel.border = element_rect(color = "black", fill = NA, size = 1)
    ) 

g_year_copy_plotly <- ggplotly(g_year_copy)

g_year_copy_plotly

The second graph observe the number of titles for each year.

Making graph
g_year_title <- ds_year %>% 
  filter(
    measure == "title_count"
  ) %>% 
  ggplot(aes(x = yr, y = value)) +
  geom_point() +
  geom_line() +
  scale_x_continuous(breaks = years_all) +
  scale_y_continuous(breaks = seq(0, 27500, by = 5000), limits = c(7500, 27500)) +
  theme_minimal() +
  labs(
    title = "Number of Titles by Year"
    , x = "The year"
    , y = "Number of titles"
  ) +
  theme(panel.grid.minor = element_blank()
        ,panel.border = element_rect(color = "black", fill = NA, size = 1)) 


g_year_title_plotly <- ggplotly(g_year_title)

g_year_title_plotly
Making graph
rm(g_year_title, g_year_copy, years_all)

What we can observe:

  • the most productive year for new titles and the year with most copies

  • the production trend

DS_GENRE

This table has 40 rows, and 15 columns. Also, in the measure column we have only two possible values - title_count means the number of titles and copy_count means the number of copies. From the third column, each column is a separate genre.

In a first graph, we see the number of copies of each genre, by year.

Making graph
ds_genre_copy <- 
  ds_genre %>% 
  filter(
    measure == "copy_count"
  ) %>% 
  group_by(yr)  %>%
  pivot_longer(
    cols = -c(yr, measure),  # adding genre column
    names_to = "genre",
    values_to = "value"
  )


g_genre_copy<- ds_genre_copy %>%
  ggplot(aes(x = genre, y = value, fill = genre)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 30000, by = 7500), limits = c(0, 37000)) +
  labs(
    title = "Number of Copies by Genre and Year",
    x = "Genre",
    y = "Number of Copies (ths.)",
    fill = "Genre"
  ) +
  theme_get() +
  theme(
    axis.text.x = element_blank(),      
    axis.title.x = element_blank() 
    ,legend.position = "bottom"
    , panel.spacing.y = unit(5, "lines")
  )

g_genre_copy_plotly <- ggplotly(g_genre_copy) %>%
  layout(
    legend = list(
      orientation = "h",           
      x = 0.5,                     
      y = -0.1,                   
      xanchor = "center"
    )
  )

g_genre_copy_plotly

In a second graph, we can see the number of titles for each genre, by year.

Making graph
ds_genre_title <- 
  ds_genre %>% 
  filter(
    measure == "title_count"
  ) %>% 
  group_by(yr) %>%
  pivot_longer(
    cols = -c(yr, measure),  # adding genre column
    names_to = "genre",
    values_to = "value"
  )

g_genre_title <- ds_genre_title %>%
  ggplot(aes(x = genre, y = value, fill = genre)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  labs(
    title = "Number of Titles by Genre and Year",
    x = "Genre",
    y = "Number of Titles",
    fill = "Genre"
  ) +
  theme_get() +
  theme(
    axis.text.x = element_blank(),      
    axis.title.x = element_blank()  
    ,legend.position = "bottom"
    ,  panel.spacing.y = unit(5, "lines"),

  )

g_genre_title_plotly <- ggplotly(g_genre_title) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_genre_title_plotly
Making graph
rm(g_genre_copy, ds_genre_copy, ds_genre_title, g_genre_title)

DS_GEOGRAPHY

This table has 40 rows, and 28 columns. Also, in the measure column we have only two possible values - title_count means the number of titles and copy_count means the number of copies. From the third column, each column is a separate geographical place.

do some preparing
region_groups <- list(
  "Західна Україна" = c("Львівська", "Івано-Франківська", "Закарпатська", "Тернопільська", "Чернівецька", "Волинська", "Рівненська"),
  "Центральна Україна" = c("Київська", "Житомирська", "Черкаська", "Кіровоградська", "Полтавська", "Вінницька"),
  "Південна Україна" = c("Одеська", "Миколаївська", "Херсонська", "Запорізька"),
  "Східна Україна" = c("Харківська", "Донецька", "Луганська", "Дніпропетровська"),
  "Північна Україна" = c("Чернігівська", "Сумська")
  , "Крим" = "Автономна Республіка Крим"
)

region_group_df <- tibble::tibble(
  region_name = unlist(region_groups),
  group = rep(names(region_groups), times = sapply(region_groups, length))
)

g_geography <- ds_geography %>%
  pivot_longer(
    cols = -c(yr, measure),    
    names_to = "region_name",
    values_to = "value"
  )  %>%
  left_join(region_group_df, by = "region_name") %>%
  select(yr, measure, region_name, group, everything())

Inspecting the number of copies

Central Ukraine (Copy count)

This graph describes the number of copies for each region in Central Ukraine

Making graph
g_geography_central_copies <- g_geography %>%
  filter(measure == "copy_count", group == "Центральна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 750, by = 250)) +      
  coord_cartesian(ylim = c(0, 750)) +                       
  labs(
    title = "Number of Copies by Region (Central Ukraine) and Year",
    x = "Region",
    y = "Number of Copies (ths.)",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank()
     ,  panel.spacing.y = unit(6, "lines")
     ,  panel.spacing.x = unit(2, "lines")

  )

g_geography_central_copies_plotly <- ggplotly(g_geography_central_copies) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_central_copies_plotly
Making graph
rm(g_geography_central_copies)

North Ukraine (Copy count)

This graph describes the number of copies for each region in North Ukraine

Making graph
g_geography_pivnichna_copies <- g_geography %>%
  filter(measure == "copy_count", group == "Північна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 500, by = 250)) +      
  coord_cartesian(ylim = c(0, 500)) +                       
  labs(
    title = "Number of Copies by Region (North Ukraine) and Year",
    x = "Region",
    y = "Number of Copies (ths.)",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_pivnichna_copies_plotly <- ggplotly(g_geography_pivnichna_copies) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_pivnichna_copies_plotly
Making graph
rm(g_geography_pivnichna_copies)

South Ukraine (Copy count)

This graph describes the number of copies for each region in South Ukraine

Making graph
g_geography_pivdena_copies <- g_geography %>%
  filter(measure == "copy_count", group == "Південна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 500, by = 250)) +      
  coord_cartesian(ylim = c(0, 500)) +                       
  labs(
    title = "Number of Copies by Region (South Ukraine) and Year",
    x = "Region",
    y = "Number of Copies (ths.)",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_pivdena_copies_plotly <- ggplotly(g_geography_pivdena_copies) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_pivdena_copies_plotly
Making graph
rm(g_geography_pivdena_copies)

West Ukraine (Copy count)

This graph describes the number of copies for each region in West Ukraine

Making graph
g_geography_zahidna_copies <- g_geography %>%
  filter(measure == "copy_count", group == "Західна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 1250, by = 500)) +      
  coord_cartesian(ylim = c(0, 1250)) +                       
  labs(
    title = "Number of Copies by Region (West Ukraine) and Year",
    x = "Region",
    y = "Number of Copies (ths.)",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_zahidna_copies_plotly <- ggplotly(g_geography_zahidna_copies) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_zahidna_copies_plotly
Making graph
rm(g_geography_zahidna_copies)

East Ukraine (Copy count)

This graph describes the number of copies for each region in East Ukraine

Making graph
g_geography_shidna_copies <- g_geography %>%
  filter(measure == "copy_count", group == "Східна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 3500, by = 1000)) +      
  coord_cartesian(ylim = c(0, 3500)) +                       
  labs(
    title = "Number of Copies by Region (East Ukraine) and Year",
    x = "Region",
    y = "Number of Copies (ths.)",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_shidna_copies_plotly <- ggplotly(g_geography_shidna_copies) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_shidna_copies_plotly
Making graph
rm(g_geography_shidna_copies)

Crimea (Copy count)

This graph describes the number of copies for Crimea

Making graph
g_geography_krym_copies <- 
  g_geography %>% 
  filter(
    measure == "copy_count",
    group == "Крим"
  ) %>% 
  ggplot(aes(x = yr, y = value, group = 1)) +   # add group=1 for geom_line
  geom_point() +
  geom_line() +
  labs(
    title = "Number of Copies by Region (Krym) and Year",
    x = "Year",
    y = "Number of Copies (ths.)"
  ) +
  scale_y_continuous(breaks = seq(0, 1500, by = 250), limits = c(0, 1500)) +  
  scale_x_continuous(breaks = 2005:2024) +
  theme_minimal() +
  theme(
    panel.grid.minor = element_blank(),
    legend.position = "none"  # No legend needed, since only one line/region
  )

g_geography_krym_copies_plotly <- ggplotly(g_geography_krym_copies)

g_geography_krym_copies_plotly
Making graph
rm(g_geography_krym_copies)

Inspectin the number of titles

Central Ukraine (Titles count)

This graph describes the number of titles for each region in Central Ukraine

Making graph
g_geography_central_title <- g_geography %>%
  filter(measure == "title_count", group == "Центральна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 750, by = 250)) +      
  coord_cartesian(ylim = c(0, 750)) +                       
  labs(
    title = "Number of Titles by Region (Central Ukraine) and Year",
    x = "Region",
    y = "Number of Titles",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_central_title_plotly <- ggplotly(g_geography_central_title) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_central_title_plotly
Making graph
rm(g_geography_central_title)

North Ukraine (Titles count)

This graph describes the number of titles for each region in North Ukraine

Making graph
g_geography_pivnichna_title <- g_geography %>%
  filter(measure == "title_count", group == "Північна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 750, by = 250)) +      
  coord_cartesian(ylim = c(0, 750)) +                       
  labs(
    title = "Number of Titles by Region (North Ukraine) and Year",
    x = "Region",
    y = "Number of Titles",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_pivnichna_title_plotly <- ggplotly(g_geography_pivnichna_title) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_pivnichna_title_plotly
Making graph
rm(g_geography_pivnichna_title)

South Ukraine (Titles count)

This graph describes the number of titles for each region in South Ukraine

Making graph
g_geography_pivdena_title <- g_geography %>%
  filter(measure == "title_count", group == "Південна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 750, by = 250)) +      
  coord_cartesian(ylim = c(0, 750)) +                       
  labs(
    title = "Number of Titles by Region (South Ukraine) and Year",
    x = "Region",
    y = "Number of Titles",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_pivdena_title_plotly <- ggplotly(g_geography_pivdena_title) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_pivdena_title_plotly
Making graph
rm(g_geography_pivdena_title)

West Ukraine (Titles count)

This graph describes the number of titles for each region in West Ukraine

Making graph
g_geography_zahidna_title <- g_geography %>%
  filter(measure == "title_count", group == "Західна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 1250, by = 500)) +      
  coord_cartesian(ylim = c(0, 1250)) +                       
  labs(
    title = "Number of Titles by Region (West Ukraine) and Year",
    x = "Region",
    y = "Number of Titles",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_zahidna_title_plotly <- ggplotly(g_geography_zahidna_title) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_zahidna_title_plotly
Making graph
rm(g_geography_zahidna_title)

East Ukraine (Titles count)

This graph describes the number of titles for each region in East Ukraine

Making graph
g_geography_shidna_title <- g_geography %>%
  filter(measure == "title_count", group == "Східна Україна") %>%
  complete(yr, region_name, fill = list(value = 0)) %>%
  ggplot(aes(x = region_name, y = value, fill = region_name)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 4000, by = 1000)) +      
  coord_cartesian(ylim = c(0, 4000)) +                       
  labs(
    title = "Number of Titles by Region (East Ukraine) and Year",
    x = "Region",
    y = "Number of Titles",
    fill = "Region"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    panel.spacing.y = unit(6, "lines"),
    panel.spacing.x = unit(2, "lines")
  )

g_geography_shidna_title_plotly <- ggplotly(g_geography_shidna_title) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_geography_shidna_title_plotly
Making graph
rm(g_geography_shidna_title)

Crimea (Titles count)

This graph describes the number of titles for Crimea

Making graph
g_geography_krym_title <- 
  g_geography %>% 
  filter(
    measure == "title_count",
    group == "Крим"
  ) %>% 
  ggplot(aes(x = yr, y = value, group = 1)) +  # group=1 for line plot
  geom_point() +
  geom_line() +
  labs(
    title = "Number of Titles by Region (Krym) and Year",
    x = "Year",
    y = "Number of Titles"
  ) +
  scale_y_continuous(breaks = seq(0, 1000, by = 250), limits = c(0, 1000)) +  
  scale_x_continuous(breaks = 2005:2024) +
  theme_minimal() +
  theme(
    panel.grid.minor = element_blank(),
    legend.position = "none"  # no legend needed
  )

g_geography_krym_title_plotly <- ggplotly(g_geography_krym_title)

g_geography_krym_title_plotly
Making graph
rm(g_geography_krym_title)

DS_LANGUAGE

This table has 14 rows, and 39 columns. Also, in the measure column we have only two possible values - title_count means the number of titles and copy_count means the number of copies. From the third column, each column is new language. We have information only for 2018 - 2024 years.

Do to inspection of this table, we will observe each year separately. Each graph display only those language, which have at least 10 different titles and 10.000 copies.

do some preparing
g_language <- ds_language %>%
  pivot_longer(
    cols = -c(yr, measure),     
    names_to = "language",
    values_to = "value"
  )

2018 (Language)

Making graph
g_language_2018 <- 
  g_language %>% 
  filter(
    yr == 2018,
    (measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
  ) %>%
  mutate(
    measure = recode(measure,
                     copy_count = "Copies",         
                     title_count = "Titles"),
    language = recode(language, 
                      `Кількома мовами народів світу` = "Декілька") 
  ) %>%
  ggplot(aes(x = language, y = value, fill = measure)) +
  geom_col(position = "dodge") +
  scale_y_continuous(breaks = seq(0, 40000, by = 5000), limits = c(0, 40000)) +  
  labs(
    title = "Number of Copies and Titles (2018)",
    x = "Language",
    y = "Count",
    fill = "Measure"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom"
    , axis.text.x = element_text(size = 8) 
  )

g_language_2018_plotly <- ggplotly(g_language_2018) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_language_2018_plotly
Making graph
rm(g_language_2018)

2019 (Language)

Making graph
g_language_2019 <- 
  g_language %>% 
  filter(
    yr == 2019,
    (measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
  ) %>%
  mutate(
    measure = recode(measure,
                     copy_count = "Copies",         
                     title_count = "Titles"),
    # Rename languages as needed
    language = recode(language, 
                      `Кількома мовами народів світу` = "Декілька")
  ) %>%
  ggplot(aes(x = language, y = value, fill = measure)) +
  geom_col(position = "dodge") +
  scale_y_continuous(breaks = seq(0, 55000, by = 10000), limits = c(0, 55000)) +  
  labs(
    title = "Number of Copies and Titles (2019)",
    x = "Language",
    y = "Count",
    fill = "Measure"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    axis.text.x = element_text(size = 8)
  )

g_language_2019_plotly <- ggplotly(g_language_2019) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_language_2019_plotly
Making graph
rm(g_language_2019)

2020 (Language)

Making graph
g_language_2020 <- 
  g_language %>% 
  filter(
    yr == 2020,
    (measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
  ) %>%
  mutate(
    measure = recode(measure,
                     copy_count = "Copies",         
                     title_count = "Titles"),
    # Rename language as needed:
    language = recode(language, 
                      `Кількома мовами народів світу` = "Декілька")
  ) %>%
  ggplot(aes(x = language, y = value, fill = measure)) +
  geom_col(position = "dodge") +
  scale_y_continuous(breaks = seq(0, 40000, by = 5000), limits = c(0, 40000)) +  
  labs(
    title = "Number of Copies and Titles (2020)",
    x = "Language",
    y = "Count",
    fill = "Measure"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    axis.text.x = element_text(size = 8)
  )

g_language_2020_plotly <- ggplotly(g_language_2020) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_language_2020_plotly
Making graph
rm(g_language_2020)

2021 (Language)

Making graph
g_language_2021 <- 
  g_language %>% 
  filter(
    yr == 2021,
    (measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
  ) %>%
  mutate(
    measure = recode(measure,
                     copy_count = "Copies",         
                     title_count = "Titles"),
    # Rename language as needed:
    language = recode(language, 
                      `Кількома мовами народів світу` = "Декілька")
  ) %>%
  ggplot(aes(x = language, y = value, fill = measure)) +
  geom_col(position = "dodge") +
  scale_y_continuous(breaks = seq(0, 40000, by = 5000), limits = c(0, 40000)) +  
  labs(
    title = "Number of Copies and Titles (2021)",
    x = "Language",
    y = "Count",
    fill = "Measure"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    axis.text.x = element_text(size = 8)
  )

g_language_2021_plotly <- ggplotly(g_language_2021) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_language_2021_plotly
Making graph
rm(g_language_2021)

2022 (Language)

Making graph
g_language_2022 <- 
  g_language %>% 
  filter(
    yr == 2022,
    (measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
  ) %>%
  mutate(
    measure = recode(measure,
                     copy_count = "Copies",         
                     title_count = "Titles"),
    # Optional: rename any language here
    language = recode(language, 
                      `Кількома мовами народів світу` = "Декілька")
  ) %>%
  ggplot(aes(x = language, y = value, fill = measure)) +
  geom_col(position = "dodge") +
  scale_y_continuous(breaks = seq(0, 15000, by = 5000), limits = c(0, 15000)) +  
  labs(
    title = "Number of Copies and Titles (2022)",
    x = "Language",
    y = "Count",
    fill = "Measure"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    axis.text.x = element_text(size = 8)
  )

g_language_2022_plotly <- ggplotly(g_language_2022) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_language_2022_plotly
Making graph
rm(g_language_2022)

2023 (Language)

Making graph
g_language_2023 <- 
  g_language %>% 
  filter(
    yr == 2023,
    (measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
  ) %>%
  mutate(
    measure = recode(measure,
                     copy_count = "Copies",         
                     title_count = "Titles"),
    # Optional: rename any language as needed
    language = recode(language, 
                      `Кількома мовами народів світу` = "Декілька")
  ) %>%
  ggplot(aes(x = language, y = value, fill = measure)) +
  geom_col(position = "dodge") +
  scale_y_continuous(breaks = seq(0, 25000, by = 5000), limits = c(0, 25000)) +  
  labs(
    title = "Number of Copies and Titles (2023)",
    x = "Language",
    y = "Count",
    fill = "Measure"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    axis.text.x = element_text(size = 8)
  )

g_language_2023_plotly <- ggplotly(g_language_2023) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_language_2023_plotly
Making graph
rm(g_language_2023)

2024 (Language)

Making graph
g_language_2024 <- 
  g_language %>% 
  filter(
    yr == 2024,
    (measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
  ) %>%
  mutate(
    measure = recode(measure,
                     copy_count = "Copies",         
                     title_count = "Titles"),
    # Optional: rename any language as needed
    language = recode(language, 
                      `Кількома мовами народів світу` = "Декілька")
  ) %>%
  ggplot(aes(x = language, y = value, fill = measure)) +
  geom_col(position = "dodge") +
  scale_y_continuous(breaks = seq(0, 35000, by = 5000), limits = c(0, 35000)) +  
  labs(
    title = "Number of Copies and Titles (2024)",
    x = "Language",
    y = "Count",
    fill = "Measure"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    axis.text.x = element_text(size = 8)
  )

g_language_2024_plotly <- ggplotly(g_language_2024) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_language_2024_plotly
Making graph
rm(g_language_2024)

DS_PUBTYPE

This table has 40 rows, and 16 columns. Also, in the measure column we have only two possible values - title_count means the number of titles and copy_count means the number of copies. From the third column, each column is new publication type.

First graph describes the number of copies of each publication type, by year.

Making graph
ds_pubtype_l <- ds_pubtype %>%
  pivot_longer(
    cols = -c(yr, measure),  
    names_to = "pubtype",      
    values_to = "value"       
  )


ds_pubtype_copy <- 
  ds_pubtype_l %>%
  filter(measure == "copy_count") %>%
  group_by(yr)

g_pubtype_copy <- ds_pubtype_copy %>%
  ggplot(aes(x = pubtype, y = value, fill = pubtype)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 30000, by = 7500), limits = c(0, 37000)) +
  coord_cartesian(ylim = c(0, 37000)) +
  labs(
    title = "Number of Copies by Publication Type and Year",
    x = "Publication Type",
    y = "Number of Copies (ths.)",
    fill = "Publication Type"
  ) +
  theme_get() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank()
        ,legend.position = "bottom"
    , panel.spacing.y = unit(5, "lines")
  )

g_pubtype_copy_plotly <- ggplotly(g_pubtype_copy) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_pubtype_copy_plotly
Making graph
rm(ds_pubtype_l, ds_pubtype_copy, g_pubtype_copy)

Second graph describes the number of titles of each publication type, by year.

Making graph
ds_pubtype_l <- ds_pubtype %>%
  pivot_longer(
    cols = -c(yr, measure),  
    names_to = "pubtype",      
    values_to = "value"       
  )

ds_pubtype_title <- 
  ds_pubtype_l %>%
  filter(measure == "title_count") %>%
  group_by(yr)

g_pubtype_title <- ds_pubtype_title %>%
  ggplot(aes(x = pubtype, y = value, fill = pubtype)) +
  geom_col() +
  facet_wrap(~ yr, ncol = 4) +
  scale_y_continuous(breaks = seq(0, 9000, by = 2500), limits = c(0, 9000)) +
  coord_cartesian(ylim = c(0, 9000)) +
  labs(
    title = "Number of Titles by Publication Type and Year",
    x = "Publication Type",
    y = "Number of Titles",
    fill = "Publication Type"
  ) +
  theme_get() +
  theme(
    axis.text.x = element_blank(),
    axis.title.x = element_blank(),
    legend.position = "bottom",
    panel.spacing.y = unit(5, "lines")
  )

g_pubtype_title_plotly <- ggplotly(g_pubtype_title) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.1,
      xanchor = "center"
    )
  )

g_pubtype_title_plotly
Making graph
rm(ds_pubtype_title, g_pubtype_title)

DS_UKR_RUS

This table has 40 rows, and 16 columns. Also, in the measure column we have only two possible values - title_count means the number of titles and copy_count means the number of copies. Third column describes the percentage Ukrainian versions to ukr and rus versions. Froths, describes the percentage of russian versions to ukr and rus versions.

First graph describer the number of copies of Ukrainian and russian copies for each year since 2005 to 2024.

Making graph
g_ukrrus_copies <- 
  ds_ukr_rus %>%
  filter(measure == "copy_count") %>%
  select(yr, ukr, rus) %>%
  pivot_longer(
    cols = c(ukr, rus),
    names_to = "language",
    values_to = "value"
  ) %>% 
  ggplot(aes(x = factor(yr), y = value, fill = language)) +
  geom_col(position = "dodge") +
  scale_x_discrete(expand = expansion(mult = c(0))) +
  labs(
    title = "Number of Copies: Ukrainian vs Russian by Year",
    x = "Year",
    y = "Number of Copies (ths.)",
    fill = "Language"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom"
  )

g_ukrrus_copies_plotly <- ggplotly(g_ukrrus_copies) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_ukrrus_copies_plotly
Making graph
rm(g_ukrrus_copies)

Second graph describer the number of titles of Ukrainian and russian titles for each year since 2005 to 2024.

Making graph
g_ukrrus_title <- 
  ds_ukr_rus %>%
  filter(measure == "title_count") %>%
  select(yr, ukr, rus) %>%
  pivot_longer(
    cols = c(ukr, rus),
    names_to = "language",
    values_to = "value"
  ) %>% 
  ggplot(aes(x = factor(yr), y = value, fill = language)) +
  geom_col(position = "dodge") +
  scale_x_discrete(expand = expansion(mult = c(0))) +
  scale_y_continuous(breaks = seq(0, 20000, by = 5000), limits = c(0, 20000)) +
  labs(
    title = "Number of Titles: Ukrainian vs Russian by Year",
    x = "Year",
    y = "Number of Titles",
    fill = "Language"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom"
  )

g_ukrrus_title_plotly <- ggplotly(g_ukrrus_title) %>%
  layout(
    legend = list(
      orientation = "h",
      x = 0.5,
      y = -0.15,
      xanchor = "center"
    )
  )

g_ukrrus_title_plotly
Making graph
rm(g_ukrrus_title)

Overral

This document created to help understand the date more deeply. I hope it was useful for you. 😉